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1.  Introduction 


This  paper  describes  UNCM1N,  a  modular  system  of  FORTRAN  subroutines 
for  the  solution  of  the  unconstrained  minimization  problem 

min  / (*)  :  K" -*R.  /i  i\ 

*e/?*  v  ■  ' 

They  are  intended  for  the  case  when  /  is  at  least  twice  continuously 
differentiable,  although  the  derivatives  do  not  have  to  be  available  analytically. 
The  algorithms  may  sometimes  solve  problems  where  /  is  only  once  continu¬ 
ously  differentiable.  No  restriction  is  made  on  the  number  of  variables  n,  but 
since  the  algorithms  store  one  nxn  matrix  and  solve  a  system  of  n  linear  equa¬ 
tions  in  n  unknowns  at  each  iteration,  they  are  intended  mainly  for  problems  of 
small  to  moderate  size,  with  n  for  example  between  2  and  100.  UNCMIN  will  suc¬ 
cessfully  solve  problems  with  n  =  l,  although  perhaps  not  as  efficiently  as  algo¬ 
rithms  intended  specifically  Tor  this  case. 

There  are  a  number  of  other  FORTRAN  routines  available  for  solving  the 
unconstrained  minimization  problem,  including  algorithms  in  the  Harwell,  IMSL. 
MINPACK,  and  NAG  libraries  (see  reference  section),  and  the  packages  by 
Shanno  and  Phua  [1960]  and  Gay  [1982],  This  paper  emphasizes  the  ways  in 
which  ours  is  distinctive.  Books  that  describe  unconstrained  minimization  algo¬ 
rithms  include  Fletcher  [1980],  Gill,  Murray,  and  Wright  [1982],  and  Dennis  and 
Schnabel  [1983]. 

The  distinguishing  feature  of  UNCMIN  is  that  it  is  a  modular  system  of  sub¬ 
routines.  This  means  that  the  user  can  build  a  variety  of  minimization  algo¬ 
rithms  with  UNCMIN,  by  selecting  from  among  several  options  for  the  step  selec¬ 
tion  process,  and  for  the  evaluation  or  approximation  of  7/  and  V8/.  The  possi¬ 
bility  of  selecting  among  alternative  strategies  for  derivative  evaluation,  namely 
user-supplied  analytical  computation,  finite  differences,  or  BFGS  updates,  is 


found  in  severed  other  packages.  The  provision  of  alternative  step  selection  stra¬ 
tegies  that  can  be  used  interchangeably  with  the  remainder  of  the  routine, 
namely  a  line  search,  double  dogleg,  and  a  locally  constrained  optimal  step 
(hereafter  referred  to  as  "hookstep").  is  rare;  in  fact  we  know  of  no  code  that 
provides  both  line  search  and  trust  region  alternatives.  The  combination  of  all 
these  options  is  believed  to  be  unique.  In  our  experience,  these  options  have 
proved  useful  to  users  who  wish  to  find  conveniently  the  best  method  to  use  on 
a  particular  class  of  problems.  The  modular  structure  also  makes  the  system 
very  useful  as  a  testing  and  research  tool,  because  alternative  strategies  can  be 
compared  in  a  controlled  environment,  and  new  approaches  for  various  parts  of 
the  minimization  algorithm  can  be  tested  readily  by  substituting  one  or  more 
new  modules  for  the  corresponding  existing  modules.  Section  2  discusses  the 
modular  structure  of  our  system  in  more  detail. 

The  FORTRAN  subroutines  in  UNCMIN  correspond  closely,  though  not  always 
exactly,  to  the  pseudo-code  in  Appendix  A  of  Dennis  and  Schnabel  [1983],  and 
were  designed  initially  as  a  companion  to  this  book.  The  methods  used  for  solv¬ 
ing  the  unconstrained  minimization  problem  were  not  intended  to  be  new,  but 
rather  a  selection  of  the  best  existing  algorithms.  Inevitably,  various  original 
features  were  introduced,  some  of  which  are  discussed  in  Section  3.  We  also 
paid  careful  attention  to  several  mundane  aspects  of  the  algorithms  that  usually 
are  important  to  users  only  when  they  malfunction,  namely  stopping  criteria, 
selection  of  finite  difference  stepsizes,  and  treatment  of  badly  seeded  problems. 
These  topics  edso  are  included  in  Section  3. 

We  tried  to  pay  careful  attention  to  the  user  interface  with  our  package.  An 
important  feature  is  the  provision  of  a  choice  between  an  easy-to-use  cedling 
sequence,  in  which  the  user  provides  only  n,  / ,  and  the  initial  estimate  x0,  and 
all  tolerances  and  algorithms  options  are  automatically  selected,  and  a  more 
complicated  call  in  which  the  user  nonetheless  needs  to  set  only  those  options 
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desired.  Also  provided  are  checking  of  user-supplied  derivatives,  and  a  variety 
of  output  options.  These  features  are  discussed  briefly  in  Section  4,  and  more 
fully  in  Weiss  [1980]  and  in  a  forthcoming  paper.  In  Section  5  we  discuss  briefly 
the  portability  and  storage  requirements  of  the  code,  as  well  as  suggestions  for 
adapting  it  to  small  computers. 

We  have  also  developed  a  reverse  communication  version  of  our  code,  REV- 
MIN.  This  version  is  algorithmically  identical  to  UNCMIN;  the  difference  is  that 
whenever  a  function  or  analytic  derivative  evaluation  is  required,  the  reverse 
communication  version  returns  to  a  dummy  driver  REVDRV  inserted  between  the 
calling  program  and  REVMIN,  evaluates  the  function  or  derivative,  and  then  re¬ 
calls  REVMIN,  which  resumes  the  optimization  algorithm  at  the  proper  place. 
This  capability  is  required  whenever  the  evaluation  of  /  (x)  requires  additional 
data  from  the  calling  program,  and  it  is  inconvenient  to  pass  this  information 
through  FORTRAN  common.  For  example,  this  is  the  situation  in  the  three  time 
series  codes  in  the  National  Bureau  of  Standards  statistical  library  STARPAC  that 
use  REVMIN.  The  need  for  reverse  communication  and  the  conversion  of  UNCMIN 
to  REVMIN  is  discussed  in  Section  8. 

Finally,  in  section  7  we  present  the  results  of  a  comprehensive  set  of  tests 
using  the  algorithms  in  UNCMIN  on  the  test  problems  in  More’,  Garbow,  and 
Hillstrom  [1981].  In  particular,  we  compare  the  three  step  selection  strategies, 
line  search,  dogleg,  and  hookstep,  in  the  case  when  V/  and  V8/  both  are  approx¬ 
imated  by  finite  differences  and  again  when  V8/  instead  is  approximated  using 
BFGS  updates.  To  our  knowledge,  these  are  the  first  such  comprehensive  and 
controlled  comparative  tests  of  these  global  strategies  for  unconstrained  minim¬ 
ization  to  be  reported.  Gay  [1982]  compares  two  different  BFGS  codes,  one 
using  a  line  search  and  the  other  a  dogleg. 


The  advantages  of  modular  design  to  the  organization,  development  and 
testing  of  any  large  computer  software  system  are  well  known.  The  modular 
organization  of  UNCMIN  has  another  important  advantage  that  we  discuss  in  this 
section.  This  advantage  is  that  we  can  organize  the  unconstrained  minimization 
algorithm  into  sections  that  are  functionally  independent,  such  as  derivative  cal¬ 
culation.  step  selection,  and  checking  stopping  criteria.  We  may  then  supply 
alternative  modules  for  each  of  these  sections,  so  long  as  they  have  the  same 
input  and  output  parameters  and  perform  the  same  function.  We  take  advan¬ 
tage  of  this  directly  by  supplying  alternative  step  selection  and  derivative  calcu¬ 
lation  modules  in  UNCMIN.  Different  combinations  of  these  then  provide  the  user 
with  a  variety  of  possible  algorithms,  which  is  advantageous  since  no  one  algo¬ 
rithm  seems  to  be  best  for  all  problem  classes.  This  structure  permits  a  user  to 
compare  the  effectiveness  of  the  alternative  strategies  for  particular  sections  of 
the  algorithm,  and  determine  which  is  best  suited  for  a  particular  class  of  prob¬ 
lems.  In  addition,  an  algorithm  developer  may  develop  and  test  a  new  version  of 
any  module  by  substituting  it  for  the  existing  module  and  comparing  the  perfor¬ 
mance  of  the  code  using  the  new  and  old  modules. 

To  illustrate  this  design,  consider  the  basic  structure  of  an  iteration  of  our 
algorithm.  In  very  general  terms,  it  is: 

Given  eR" .  the  best  current  estimate  of  the  minimizer; 
ge  =  V/  (xc)  or  a  finite  difference  approximation  to  it; 

H  =  V8/  (zc)  or  a  finite  difference  or  BFGS  approximation  to  it: 

1.  Calculate  the  Newton  step  p  =  -H~lge,  or  a  variation  if  H  is  not  positive 
definite. 

2.  Using  p,  determine  the  next  iterate  xt.  ("Step  selection  strategy") 
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3.  Evaluate  g  h  =  V/  (x+)  or  an  approximation  to  it. 

4.  Decide  whether  to  stop.  If  not: 

5.  Evaluate  H  =  V®/  (x  or  an  approximation  to  it. 

6.  Set  xc  «-  x+,  gc  g+  and  return  to  step  1. 

In  UNCMIN,  these  steps  are  carried  out  by  the  modules  listed  in  Figure  3.1. 

There  are  eighteen  possible  algorithmic  combinations  that  a  user  may 
select:  the  cross  product  of  any  one  of  the  three  step  selection  strategies  (line 
search,  dogleg,  or  hookstep)  with  any  one  of  the  two  gradient  calculation 
methods  (analytic  or  finite  difference)  and  with  any  one  of  the  three  Hessian  cal¬ 
culation  methods  (analytic  or  finite  difference  or  BFGS  update).  Some  of  the 
other  choices  shown  in  Fig.  2.1  are  controlled  wholely  by  the  software.  When 
using  finite  difference  gradients,  the  routine  starts  using  forward  difference 
approximations,  and  switches  to  central  differences  only  if  forward  differences 
seem  to  have  become  too  inaccurate  at  some  iteration  because  very  small  steps 
are  being  taken  in  an  uphill  direction.  When  finite  difference  Hessian  approxima¬ 
tion  is  requested,  the  approximation  is  made  from  analytic  gradients  if  they  are 
available,  from  function  values  otherwise.  When  BFGS  approximations  to  the 
Hessian  are  used,  the  factored  form  of  the  update  (see  e  g.  Goldfarb  [1978])  is 
used  if  the  step  selection  strategy  is  the  line  search  or  dogleg;  this  causes  each 
iteration  to  require  only  0(n2)  operations.  The  factored  form  does  not  interface 
well  with  the  hookstep  strategy,  so  an  unfactored  update  is  used  with  this  step 
selection  method. 

Of  the  eighteen  user-controlled  algorithmic  possibilities,  three  probably  are 
unrealistic,  the  combinations  involving  analytic  Hessians  and  finite  difference 
gradients.  Any  of  the  other  fifteen  combinations  might  be  used  in  practice.  The 
default  choice  (see  Section  5)  is  the  line  search  with  finite  difference  gradients 
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Figure  2.1  Modules  for  Unconstrained  Minimization 
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and  BFGS  approximation  to  the  Hessian,  but  our  users  have  used  many  other 
combinations.  A  computational  comparison  of  the  three  step  selection  stra¬ 
tegies  is  given  in  Section  7. 

The  modular  structure  of  UNCMIN  also  has  been  helpful  in  our  development 
and  testing  of  new  strategies  for  unconstrained  minimization,  because  we  can 
often  form  the  desired  new  algorithm  by  replacing  just  one  module  in  UNCMIN. 
Then  we  can  compare  the  performance  of  the  two  codes  which  are  identical  in 
all  other  respects.  We  have  used  the  modular  system  to  test  new  secant  updates 
(replacing  the  BFGS  module  in  step  5),  to  test  new  step  selection  strategies 
(inserting  a  new  option  in  step  2),  and  to  test  strategies  for  handling  indefinite 
Hessians  (replacing  the  first  part  of  step  l).  Using  modular  replacement  also 
significantly  decreases  the  time  required  to  construct  test  codes. 

Another  advantage  of  our  modular  design  is  that  several  of  the  modules  can 
also  be  used  in  a  similar  system  of  algorithms  for  solving  simultaneous  systems 
of  nonlinear  equations.  Most  importantly,  all  the  step  selection  algorithms  of 
step  2  can  be  used  without  change.  This  would  significantly  reduce  the  time  to 
construct  additional  software  for  solving  systems  of  nonlinear  equations.  The 
pseudo-code  in  the  appendix  of  Dennis  and  Schnabel  [1963],  which  is  intended 
for  both  unconstrained  minimization  and  nonlinear  equations,  also  shares 
modules  in  this  fashion. 

Figure  2.1  does  not  list  all  the  modules  in  UNCMIN;  in  particular,  modules 
involved  in  the  initialization  phase  of  the  algorithm  and  severed  service  modules 
have  been  omitted.  UNCMIN  contains  a  total  of  36  subroutines;  a  pared  down 
version  is  described  in  Section  4. 
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3.  Interesting  Algorithmic  Features 

The  methods  implemented  in  UNCMIN  are  described  in  detail  in  Dennis  and 
Schnabel  [1983].  Our  intention  in  this  book  was  to  collect  the  best  existing  algo¬ 
rithms  rather  than  to  propose  new  one:.  Inevitably,  some  n?w  features  we**e 
introduced.  In  this  section  we  mention  briefly  the  algorithms  available  in  UNC¬ 
MIN.  and  some  of  our  innovations.  We  assume  that  the  reader  of  this  section  is 
familiar  with  modem  algorithms  for  unconstrained  minimization;  some 
comprehensive  references  are  Fletcher  [i960],  Gill.  Murray,  and  Wright  [1901], 
and  Dennis  and  Schnabel  [1983]. 

The  three  step  selection  strategies,  a  line  search,  a  dogleg,  and  a  hookstep 
(locally  constrained  optimal  step),  are  all  fairly  standard.  The  line  search  is  a 
normal  backtracking  line  search  using  safeguarded  quadratic  interpolation  for 
the  first  backtrack  and  safeguarded  cubic  interpolation  for  any  subsequent 
backtracks  at  each  iteration.  It  terminates  when  the  condition 

c)  +  l(T4V/(xc)r  (x+  -  xc)  (3.1) 

is  satisfied  for  the  first  time.  A  second  common  line  search  condition 

V  (*+)r  (*♦  -  xc)  &  Vf  (xc)r  (x+  -  xc),  /3e(l(T4.l)  (3.2) 

is  not  enforced  explicitly  by  the  code.  Our  practical  experience  is  that  (3.2)  is 

virtually  always  satisfied  by  the  first  step  to  satisfy  (3.  l).  and  that  when  it  is  not. 
continuing  the  line  search  to  enforce  (3.2)  (with,  say.  =  0.9)  makes  virtually  no 
difference  In  the  ultimate  efficiency  of  the  algorithm.  Also,  our  line  search  algo¬ 
rithm  is  globally  convergent  without  requiring  (3.2)  due  to  the  safeguards  in  the 
backtracking  strategy  (see  Shultz,  Schnabel,  and  Byrd  [1982]),  except  in  the 
BFGS  case  where  no  general  global  convergence  result  for  a  line  search  algo¬ 
rithm  exists. 

The  dogleg  strategy  implemented  is  the  double  dogleg  of  Dennis  and  Mei 
[1970].  The  hookstep  algorithm  is  a  minor  modification  of  the  Levenberg- 
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Marquardt  algorithm  of  More'  [1978]  for  nonlinear  least  squares.  Both  the 
dogleg  and  the  hookstep  algorithms  can  assume  that  the  model  Hessian  matrix 
is  positive  definite  for  reasons  indicated  below.  The  trust  region  updating  stra¬ 
tegy  is  fairly  standard,  and  is  documented  in  Dennis  and  Schnabel  [1983].  Both 
trust  region  algorithms  satisfy  the  conditions  of  Shuitz,  Schnabel,  and  Byrd  for 
global  convergence. 

The  formulas  for  finite  difference  derivative  approximations  are  the  stan¬ 
dard  ones.  Stepsizes  are  calculated  automatically  according  to  the  following 
rules.  For  forward  difference  approximations  to  the  gradient  (or  Hessian 
approximation  usi.  g  analytic  gradients),  the  stepsize  used  to  perturb  the  Xth 
component  x*  of  the  current  vector  zc  is 

hi  =  sign (aq)  •  io -MOfOTS/z  *  maxj  j  typxil  (3.3) 

where  NDIG1TS  is  the  number  of  accurate  digits  in  the  objective  function  /(*) 

and  typxi  is  a  typical  magnitude  of  the  ith  component  of  x .  If  the  user  does  not 
supply  values  for  N DIGITS  and  typxt,  the  default  values  N DIGITS  - 
-log l0(macheps)  -  corresponding  to  full  accuracy  in  /  (x)  -  and  typa^  -  1  are 
used.  For  justification  of  (3.3),  see  for  example  Dennis  and  Schnabel  [1983].  For 
central  difference  approximation  of  the  gradient,  or  for  Hessian  approximation 
from  function  values,  the  stepsize  is  (3.3)  with  NDIGITS/ 2  replaced  by  N DI¬ 
GITS/  3.  In  our  experience,  these  stepsizes  are  quite  satisfactory  unless  the 
user  assumes  the  default  value  for  NDIGITS  when  in  fact  far  fewer  digits  of  /(x) 
are  accurate.  In  this  case,  entirely  inaccurate  derivative  approximations  may 
result.  If  the  user  does  not  know  the  approximate  value  of  NDIGITS,  it  may  be 
estimated  easily  and  accurately  by  the  routine  of  Hamming  [1971]  given  in  Gill, 
Murray,  and  Wright  [1981]. 

When  finite  difference  gradient  approximation  is  selected,  our  software 
starts  with  forward  difference  approximation,  and  switches  to  central 
differences  if  at  some  iteration  the  steplength  or  trust  region  becomes  so  small 


that  ||x«  -xc]|  is  within  the  stopping  tolerance,  even  though  /(x,)  does  not 
satisfy  (3.1).  In  this  case,  the  iteration  is  restarted  using  a  central  difference 
gradient,  and  central  difference  gradients  are  used  thereafter.  In  our  experi¬ 
ence,  this  switch  is  invoked  occasionally  at  the  final  iterations  of  the  algorithm, 
especially  if  the  stopping  tolerance  for  the  gradient  is  so  stringent  that  it  cannot 
be  satisfied  using  forward  difference  gradient  approximations. 

When  BFGS  Hessian  approximations  are  requested,  a  scaled  version  of  the 
identity  matrix  is  the  initial  approximant.  All  the  approximants  are  positive 
definite;  in  the  rare  case  that 

(z+  -xc)T  (V/(x+)  -V/(zc))<  0 
the  update  at  that  iteration  is  skipped. 

A  still  unresolved  problem  in  unconstrained  minimization  is  what  to  do  when 
analytic  or  finite  difference  Hessians  are  used  and  the  current  value,  say  H,  is 
not  positive  definite  at  some  iteration.  Various  strategies  have  been  proposed, 
see  for  example  Gill,  Murray,  and  Wright  [1981],  Gay  [1981],  Sorensen  [1982], 
More'  and  Sorensen  [1981],  and  Shultz,  Schnabel,  and  Byrd  [1982].  These  are 
still  being  tested  and  compared.  We  use  a  fairly  simple  approach  related  to  the 
approaches  of  Gill,  Murray,  and  Wright  and  to  the  hookstep  algorithm.  A  Chole- 
sky  factorization  of  H  is  attempted;  it  results  in  the  factorization 

LLt  =  H  +  D 

where  D  is  a  non-negative  diagonal  matrix  that  is  zero  if  H  is  positive  definite. 
The  factorization  algorithm  is  similar  to  Gill,  Murray,  and  Wright’s.  Then,  if  D* 0, 
the  Cholesky  factorization  of 

H  =H  +  minfsdd,  ||Z?||,J'/ 

is  calculated,  where  sdd  is  that  smallest  positive  number  such  that  H+sdd*I  is 
"safely"  positive  definite.  Then  H  replaces  H  for  the  remainder  of  the  iteration. 
This  approach  is  more  expensive  than  Gill,  Murray,  and  Wright’s,  but  it  is  well 


justified  by  its  relation  to  the  optimal  step  approach,  and  it  assures  global  con¬ 
vergence.  Various  less  expensive  but  more  complex  strategies  that  assure  glo¬ 
bal  convergence  are  described  in  Shultz,  Schnabel,  and  Byrd. 

We  paid  careful  attention  to  two  mundane  but  important  aspects  of  minimi¬ 
zation  algorithms,  scaling  and  stopping  criteria.  The  software  package  is  coded 
so  that  if  the  user  inputs  the  typical  magnitude  typxx  of  each  component  of  x, 
the  performance  of  the  package  is  then  equivalent  to  what  would  result  from 
redefining  the  independent  variable  in  the  user's  function  with 

1  /typxx 

xtcaUi  ~ 

1/ typxn 

and  running  the  package  without  scaling.  The  default  value  for  each  typxt  is  1. 
In  our  experience,  users  can  usually  supply  appropriate  values  for  typxi,  and 
most  badly  scaled  problems  can  be  solved  successfully  using  this  approach. 
Strategies  in  which  the  code  estimates  typx  at  each  iteration  are  still  not  well 
established,  but  such  a  strategy  could  be  incorporated  into  UNCMIN  merely  by 
adding  a  rescaling  module  called  once  at  the  start  of  each  iteration. 

There  are  five  stopping  criteria  in  UNCMIN:  l)  V/(x+) SO;  2)  x,.-xc ;  3)  the 
package  could  not  satisfy  (3.1)  at  the  last  iteration;  4)  iteration  limit  exceeded; 
and  5)  divergence  suspected  (/  (z)  unbounded  below  or  /  (x)  approaches  a  finite 
Value  asymptotically  from  above).  We  attempted  to  make  the  first  two  tests  as 
scale  independent  as  possible;  see  Dennis  and  Schnabel  for  details.  In  our 
experience,  when  the  code  stops  due  to  V/(z>)-0,  it  is  almost  always  near  a 
local  minimizer.  When  it  stops  because  x*-xe  it  is  usually  near  a  solution;  how¬ 
ever  this  tolerance  should  be  set  quite  small  since  these  algorithms  sometimes 
take  very  small  steps  while  still  far  from  the  solution.  When  the  algorithm  stops 
because  the  last  iteration  could  not  satisfy  (3.1),  it  is  sometimes  near  a  solution 
and  unable  to  achieve  additional  accuracy  due  to  finite  precision  affects;  this 
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occurs  most  often  when  finite  difference  gradients  are  used,  and  the  accuracy 
requested  is  too  high.  Divergence  is  tested  by  imposing  a  very  large  maximum 
step  size;  if  five  consecutive  steps  are  at  least  99%  of  this  size,  divergence  is 
suspected. 


4.  User  Oriented  Features 

We  have  attempted  to  make  UNCMIN  easy  and  helpful  to  use.  This  section 
discusses  three  features  of  UNCMIN  included  for  this  reason  :  alternative  calls  to 
UNCMIN,  automatic  checking  of  user-supplied  analytic  derivatives,  and  various 
levels  of  printed  output. 

UNCMIN  may  be  called  either  with  a  very  simple  calling  sequence,  or  with  a 
more  complex  sequence  in  which  the  user  still  chooses  the  amount  of  informa¬ 
tion  supplied  and  relies  upon  defaults  tor  the  remaining  variables.  The  simple 
calling  sequence  is 

CALL  OPTIFO  (NR.  N.  X.  FCN,  XPLS,  FPLS ,  GPLS.  ITRMCD ,  A,  WRK). 

The  user  supplies  the  problem  dimension  N,  the  matrix  and  vector  work  arrays 
A  and  WRK.  the  row  dimension  NR  of  A,  the  starting  value  X,  and  the  objective 
function  FCN.  (The  other  four  parameters  are  output  parameters  discussed 
below.)  OPTIFO  then  calls  the  subroutine  DFAULT  to  assign  default  values  to  all 
algorithmic  option  parameters  (method  of  derivative  approximation  and  step 
selection  strategy),  stopping  tolerances,  scaling  information,  level  of  output,  and 
several  miscellaneous  tolerances.  Using  these  defaults  it  calls  subroutine 
OPTIFO  which  does  the  minimization.  To  obtain  control  of  any  or  all  of  these 
parameters,  the  user  instead  writes  a  driver  that  first  cedis  DFAULT  to  set  all 
Input  parameters  to  their  default  values,  then  changes  only  those  parameters 
for  which  non-default  values  are  desired,  and  finedly  cedis  0PT1F9,  using  the 
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calling  sequence 

CALL  0TPIF9  (NR.  N,  X,  FCN.  DlFCN.  D2FCN,  TYPX,  TYPF,  METHOD. 
tEXP,  USG,  N DIGIT,  ITNLIM,  IAGFLG,  IAHFLG,  IPR,  DLT .  GRADTL, 
STEPMX.  STEPTL,  XPLS,  FPLS,  GPLS,  ITRMCD,  A.  V/RK). 

Tl>i a  allows  the  user  to  be  concerned  with  only  the  minimum  number  of  parame¬ 
ters  necessary.  For  further  details,  see  Weiss  [1980]. 

Automatic  checking  of  derivatives  is  provided  because  in  our  experience, 
incorrectly  coded  derivatives  are  a  common  cause  of  failure  of  optimization  rou¬ 
tines.  When  an  analytic  gradient  or  Hessian  is  supplied,  UNCMIN  automatically 
compares  its  value  at  the  starting  point  to  a  finite  difference  approximation.  If 
any  component  of  an  analytic  derivative  differs  by  more  than  1%  from  the 
corresponding  finite  difference  component  (with  safeguarding  for  near-zero 
values),  UNCMIN  returns  with  an  error  termination  code.  The  user  may  cause 
derivative  checking  to  be  skipped  by  supplying  an  appropriate  input  parameter 
value. 

The  package  returns  a  termination  code  ITRMCD,  its  best  approximation  to 
the  minimizer  XPLS,  and  the  function  and  gradient  values  FPLS  and  GPLS  at 
XPLS.  The  default  level  of  printed  output  consists  of  reporting  the  input  param¬ 
eters,  a  cause  of  termination  message,  and  parameter,  function  and  gradient 
values  at  the  initial  and  final  iterations.  By  varying  the  input  parameter  MSG. 
the  user  may  suppress  printed  output  entirely,  or  may  cause  the  results  after 
each  intermediate  iteration  to  be  printed.  Even  more  detailed  output  useful 
mainly  for  algorithm  development  and  testing  may  be  obtained  by  activating 
output  statements  that  are  imbedded  as  comments  in  the  code.  For  more 
details,  see  Weiss  [i960]. 
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5.  Computer  Environment  Considerations 

This  section  discusses  the  language, storage  and  floating  point  environment 
requirements  of  UNCMIN. 

The  entire  package  is  coded  in  ANSI  1966  standard  FORTRAN,  and  elicits 
only  one  objection  from  the  PFORT  verifier  (Ryder  [1974]).  In  the  subroutine 
FSTOFD,  a  formal  parameter  which  is  annxm  matrix  is  allowed  to  correspond  in 
some  calls  to  an  actual  parameter  which  is  an  n  -vector.  This  allows  FSTOFD  to 
be  used  in  calculating  both  finite  difference  gradients  and  Hessians.  The  excep¬ 
tion  is  acceptable  to  virtually  all  FORTRAN  compilers  and  the  violation  wets  made 
deliberately  on  this  basis.  The  package  also  is  acceptable  to  at  least  some  FOR¬ 
TRAN  77  compilers,  though  it  is  not  standard  due  to  the  use  of  Hollerith  con¬ 
stants. 

The  reed  variables  in  UNCMIN  are  all  single  precision,  suitable  for  the  equip¬ 
ment  of  vendors  like  Cray  or  CDC  whose  single  precision  numbers  have  14  or 
more  base  10  places.  However  the  single  precision  values  of  IBM  or  DEC 
machines,  roughly  7  base  10  places,  sometimes  are  insufficient  for  uncon¬ 
strained  optimization  problems.  A  double  precision  version  of  UNCMIN  will  be 
available  for  such  machines.  The  only  machine  dependent  constants  used  by  the 
code  are  functions  of  the  machine  epsilon,  which  is  calculated  by  the  code. 

The  code  contains  approximately  3200  lines,  55 %  of  which  are  comments. 
On  some  small  machines,  the  object  code  produced  is  too  long  and  it  is  prefer¬ 
able  to  load  only  those  modules  which  are  being  used.  For  example,  a  subset 
consisting  of  only  the  modules  required  for  the  default  algorithmic  options 
(finite  difference  gradients,  secant  Hessian  approximations,  line  search)  has 
only  2200  lines.  We  have  found  that  our  users  easily  construct  such  pared  down 
versions  from  our  package. 
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The  code  requires  that  the  user  supply  one  matrix  of  size  at  least  nxn  and 
9  vectors  of  size  at  least  n  for  work  space,  where  n  is  the  problem  dimension. 
The  matrix  and  one  of  the  vectors  are  used  to  store  one  nxn  symmetric  matrix 
and  one  nxn  lower  triangular  matrix.  It  would  be  possible  to  implement  a  few  of 
the  methods  available  in  the  package  using  only  n8/2+0(n)  storage,  but  this 
savings  is  not  made  since  it  is  incompatible  with  the  modular  structure  of  our 
code. 


6.  Reverse  Communication 

The  term  reverse  communication  has  been  used  by  several  authors  (Krogh 
[1969];  Gill,  Murray,  Picken,  and  Wright  [1979];  Gay  [19B0J;  More'  [1980,  1902]) 
to  describe  a  program  structure  in  which  control  is  returned  to  the  top  level  of 
the  package  or  to  the  calling  program  on  every  occasion  that  a  new  function 
value  is  required.  This  capability  was  required  to  perform  unconstrained  minim¬ 
izations  within  several  time  series  codes  in  the  National  Bureau  of  Standard's 
STARPAC  library.  For  this  reason,  we  produced  a  reverse  communication  ver¬ 
sion  of  UNCMIN  called  REVMIN. 

The  concept  and  use  of  reverse  communication  are  quite  old.  Two  early 
uses  of  which  we  are  aware  were  in  Subroutine  VD01  of  the  Harwell  Subroutine 
Library  (see  references),  written  in  1964  by  M.  J.  D.  Powell  to  minimize  a  non¬ 
linear  function  of  one  variable,  and  in  a  contour  plotting  subroutine  written  in 
1965  by  C.  Lawson,  N.  Block,  and  R.  Garrett  [1965]  at  Jet  Propulsion  Laboratory. 
Little  has  been  written  about  the  reverse  communication  process,  however, 
•specially  about  the  transformation  of  a  non-reverse  communication  package 
into  a  reverse  communication  package.  Therefore,  this  section  discusses  in 
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some  detail  the  need  for  REVM1N  and  the  transformation  of  UNCMIN  into  REVMIN. 

In  simplified  terms,  the  time  series  applications  have  the  following  form. 
The  user  calls  a  time  series  modelling  routine,  say  TIME,  passing  into  TIME  a 
large  amount  of  data,  say  TDATA  (see  Fig.  8.1).  One  task  the  time  series  pack¬ 
age  then  performs  is  to  set  up  and  solve  a  maximum  likelihood  problem  of  the 
form 

min  TIMEFN  (*.  TDATA) 

X  €i?n 

where  TIMEFN  is  a  function  that  it  constructs.  The  difficulty  this  presents  to 
UNCMIN  or  any  standard  minimization  routine  is  that  TDATA  must  be  available 
to  TIMEFN  at  each  place  where  TIMEFN  is  called  within  UNCMIN.  One  way  to 
accomplish  this  is  to  pass  TDATA  through  UNCMIN  to  each  subroutine  that  con¬ 
tains  a  cadi  to  the  objective  function,  by  adding  the  parameter  TDATA  to  each 
subroutine  argument  list  along  the  path  to  each  such  calling  routine.  This  is 
undesirable  since  it  would  require  a  separate  source  and  object  code  version  of 
UNCMIN  each  time  TDATA  changed  form.  (We  had  three  different  time  series 
applications,  each  with  a  different  version  of  TDATA  consisting  of  several  large 
vectors  and  matrices.) 

A  second  solution  is  to  pass  TDATA  directly  from  TIME  to  TIMEFN  via  a 
labelled  COMMON  block.  However  since  TDATA  is  a  parameter  to  TIME,  it  cannot 
be  defined  by  TIME  to  lie  in  COMMON  and  would  need  to  be  copied  into  a  work 
array  in  COMMON  instead.  Since  TDATA  typically  contains  vectors  with  several 
thousand  data  values  in  our  applications,  this  doubling  of  storage  is  undesirable. 

A  third  solution  is  to  have  the  user’s  program,  which  constructs  TDATA, 
pass  it  to  both  TIME  and  TIMEFN  via  labelled  COMMON.  This  avoids  the  need  to 
copy  TDATA  and  no  extra  storage  is  used.  A  disadvantage  of  this  solution  is  that 
many  FORTRAN  software  packages  try  to  avoid  passing  arguments  through  COM¬ 
MON  because  of  potential  for  name  conflicts  between  COMMONS.  In  addition,  the 
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Figure  6.1  - 

Non-reverse  communication 
example 


Figure  6.2  -- 

Reverse  communication  version 
of  same  example 
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fact  that  COMMON  blocks  may  not  contain  dynamically  dimensioned  arrays 
severely  limits  the  applicability  of  this  solution 

The  reverse  communication  solution  is  indicated  in  Figure  6.2.  Each  time  a 
subroutine  in  UNCMIN  needs  a  value  of  a  T1MEFN  (or  an  analytic  derivative,  if 
supplied  by  the  user)  the  package  records  its  state,  returns  up  through  UNCMIN 
to  a  dummy  driver,  REVDRV,  between  UNCMIN  and  TIME,  calls  TIMEFN  using 
TDATA  (which  has  been  passed  as  a  parameter  to  REVDRV),  and  returns  with  the 
function  value  down  to  the  point  in  UNCMIN  that  requested  it.  To  accomplish 
this.  UNCMIN  is  converted  by  the  process  described  below  into  a  reverse  com¬ 
munication  package  REVMIN  that  implements  the  identical  optimization  algo¬ 
rithm  but  obtains  values  of  user-supplied  functions  by  reverse  communication. 
If  TDATA  changes  form,  the  only  changes  required  to  REVDRV  are  to  change  its 
formal  parameter  list,  its  declaration  of  TDATA.  and  its  call  to  TIMEFN;  REVMIN 
is  unaffected  by  the  form  of  TDATA.  We  emphasize  that  the  normal  and  reverse 
communication  versions  of  a  numerical  algorithm  perform  identical  calcula¬ 
tions,  and  differ  only  in  the  method  of  obtaining  values  of  user-supplied  func¬ 
tions. 

To  accomplish  the  bubbling  up  and  down  through  the  package  (the  wiggly 
arrows  in  Fig.  6.2),  a  RETURN  statement  is  inserted  in  place  of  each  call  to  a 
user-supplied  function,  say  the  one  in  SUB2  in  Fig.  6,1.  In  addition,  a  local  vari¬ 
able  (say  SPOT)  is  added  to  SUB2  to  identify  the  location  in  SUB2  of  a  request 
for  a  function  value;  it  will  be  used  to  direct  the  flow  of  control  to  this  statement 
at  the  end  of  the  bubbling  down  phase.  Also,  one  new  parameter  (say  REVPAR) 
is  added  to  SUB2  to  indicate  to  the  level  above  whether  this  is  a  normal  return 
( REVPAR  -  0)  or  a  return  to  obtain  a  function  value  ( REVPAR  >  0);  if  REVPAR  is 
positive,  its  value  indicates  to  REVDRV  whether  the  function,  gradient,  or  Hes¬ 
sian  should  be  evaluated.  Finally,  the  preservation  of  the  values  of  all  local  vari¬ 
ables  in  SUB2  is  ensured  by  placing  them  in  a  labelled  COMMON  block  that  is 
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shared  with  REVDRV.  (In  FORTRAN  77  this  can  be  accomplished  by  means  of  the 
SAVE  statement  instead)  While  this  may  seem  to  contradict  the  avoidance  of 
COMMON  mentioned  above,  the  difference  is  that  the  user  does  not  use  COMMON. 
Another  solution  is  to  pass  all  the  local  variables  to  REVDRV  through  the  parame¬ 
ter  lists.  Then  all  subroutines  and  subroutine  calls  on  the  path  from  REVDRV 
down  to  SUB2,  in  our  example  only  SUBl,  are  modified  similarly.  The  statement 
“IF  (REVPAR  .GT.  0)  RETURN”  is  inserted  following  each  subroutine  call  leading 
down  the  path,  and  the  local  variable  SPOT  is  used  to  identify  this  subroutine 
call,  to  which  SUBl  will  return  control  on  the  way  back  down.  In  addition,  all  the 
local  variables  in  SUBl  are  saved,  and  REVPAR  is  added  to  the  parameter  list  so 
that  it  may  be  passed  on  up. 

In  this  manner,  the  flow  of  control  is  returned  to  REVDRV.  Here.  REVPAR  is 
used  by  REVDRV  to  select  the  user-supplied  function  to  evaluate,  with  the  proper 
parameters  After  the  function  call,  REVDRV  calls  REVMIN,  which  must  return 
with  the  function  value  to  the  statement  in  SUB2  following  the  point  where  the 
function  evaluation  was  requested.  This  is  accomplished  by  a  GO  TO  statement 
conditional  on  the  values  of  the  parameter  REVPAR  and  the  local  variable  SPOT 
which  is  placed  at  the  beginning  of  each  subroutine  on  the  wiggly  path  between 
REVDRV  and  SUB2.  If  REVPAR  >  0,  meaning  that  the  bubbling  down  phase  of 
reverse  communication  is  in  process,  a  computed  GO  TO,  based  on  the  value  of 
SPOT,  takes  each  subroutine  between  REVDRV  and  SUBS  immediately  to  the 
subroutine  call  that  will  send  it  to  the  next  subroutine  down  along  the  wiggly 
path.  A  similar  statement  in  SUBS  sends  it  to  the  statement  following  the  point 
where  the  function  invocation  occurred.  If  REVPAR  =  0,  meaning  reverse  com¬ 
munication  is  not  in  process,  the  subroutine  starts  with  its  original  first  state¬ 
ment.  (The  details  of  passing  the  normal  function  a-guments  up  and  the  func¬ 
tion  value  down  have  been  omitted  here.) 
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The  transformation  from  a  normal  system  of  subroutines  to  a  reverse  com¬ 
munication  version  may  be  accomplished  manually;  it  is  facilitated  by  using  a 
tool  such  as  DAVE  (Osterweil  and  Fosdick  [1976])  to  identify  all  the  paths  in  the 
system  to  calls  of  user-supplied  functions.  We  believe  it  would  also  be  possible 
to  accomplish  the  transformation  by  an  automatic  source  to  source  transforma¬ 
tion  tool.  A  problem  that  must  be  recognized,  however,  is  the  complications 
that  can  arise  due  to  the  aliasing  of  function  names  when  they  are  passed  as 
parameters.  This  creates  the  problem  that  in  a  statement  like  CALL  FN  (...)  in 
SUB2  in  Fig.  6.1.  FN  may  refer  to  any  one  of  the  user-supplied  functions  whose 
names  were  input  by  the  user,  depending  perhaps  on  the  execution  path  taken 
to  reach  SUB2.  For  example,  this  occurs  in  UNCM1N  where  the  first  order  for¬ 
ward  difference  routine  FSTOFD  includes  the  input  parameter  FCN  and  several 
calls  of  FCN .  Here  FCN  may  equate  to  the  name  of  either  the  user-supplied 
objective  function  or  the  user-supplied  gradient  function,  depending,  respec¬ 
tively,  on  whether  FSTOFD  was  called  by  the  driver  to  compute  a  gradient  or  by 
the  Hessian  approximation  algorithm  to  compute  a  Hessian.  In  a  case  like  this, 
the  transformation  of  UNCMIN  into  REVMIN  must  arrange  for  each  function 
parameter  to  be  accompanied  by  an  integer  parameter  identifying  the  function. 
Additional  complications  arise  from  the  need  to  ensure  that  all  arguments  to 
which  the  optimizer  might  apply  each  function  are  accessible  in  REVDKV.  Furth¬ 
ermore,  each  distinct  set  of  arguments  to  which  a  reverse  communication  func¬ 
tion  may  be  applied  represents  a  distinct  call  required  in  REVDRV.  Thus,  the 
three  reverse  communications  functions  of  REVMIN  (the  function  FCN,  gradient 
DlFCN .  and  Hessian  D2FCN)  produce  seven  distinct  calls  in  REVDRV,  since  the 
function  is  evaluated  on  two  different  sets  of  arguments,  the  gradient  on  four, 
and  the  Hessian  on  one. 

There  is  also  at  least  one  minor  technical  problem  in  the  transformation  to 
a  reverse  communication  system.  If  the  function  call  is  inside  a  DO  loop  (this  is 
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common,  for  example  in  finite  difference  routines),  the  loop  must  be  rewritten 
in  the  primitive  way  since  most  versions  of  FORTRAN  do  not  permit  re-entry  into 
the  body  of  the  loop.  Analogous  problems  may  occur  with  other  control  struc¬ 
tures,  for  example  a  call  embedded  in  a  logical  IF  statement. 

The  additional  costs  of  using  a  reverse  communication  version  of  a  system 
of  numerical  algorithms  are  a  small  amount  of  additional  storage,  a  fairly  small 
increase  in  the  size  of  the  source  and  object  code  (in  our  case,  the  number  of 
non-comment  source  lines  increased  15%,  from  1670  to  1917),  and  a  consider¬ 
able  addition  to  the  cost  of  executing  the  algorithm.  Using  the  rough  timing 
data  available  on  our  CYBER  170/750,  the  increase  in  CPU  time  when  running 
the  same  test  problems  using  UNCMIN  and  REVMIN  averaged  between  25%  and 
50%  for  problems  where  the  function  is  inexpensive  to  evaluate,  although  occa¬ 
sionally  the  change  was  outside  this  range.  The  additional  cost  was  at  the  higher 
end  of  this  range  when  both  the  gradient  and  Hessian  were  approximated  by 
finite  differences,  since  this  requires  many  function  calls.  It  was  at  the  lower  end 
when  only  the  gradient  was  approximated  by  finite  differences  and  the  Hessian 
by  secant  updates,  and  would  have  been  lower  yet  if  analytic  gradients  had  been 
supplied.  Our  test  problems  had  dimension  between  2  and  20;  when  using  finite 
difference  gradients,  the  additional  cost  of  reverse  communication  increases 
with  the  dimension  of  the  problem.  Of  course,  if  the  objective  function  is  at  all 
expensive  to  evaluate,  as  it  is  in  many  real-world  problems  such  as  our  time 
series  example,  the  additional  execution  cost  due  to  reverse  communication 
quickly  becomes  negligible. 

Finally,  we  mention  that  the  internal  procedure  mechanism  that  has  been 
proposed  as  a  part  of  FORTRAN  BX  would  eliminate  the  need  for  reverse  com¬ 
munication.  This  is  because  internal  procedures  would  be  allowed  global  access 
to  variables  in  other  subroutines  through  the  INHERIT  statement.  In  our  exam¬ 
ple,  TIMEFN  would  become  an  internal  function  with  argument  x,  but  TDATA 


would  no  longer  be  an  argument  of  TIMEFN.  Instead.  TIMEFN  would  include  the 
declaration  INHERIT  TDATA  which  would  give  TIMEFN  access  to  TDATA  whenever 
it  was  called.  Thus,  the  non-reverse  communication  version  of  the  optimizer 
(Fig.  6.1)  would  suffice. 


7.  Comparative  Testing 

The  provision  of  alternative  modules  for  derivative  evaluation  and  step 
selection  in  UNCMIN  affords  an  excellent  controlled  environment  for  compara¬ 
tive  testing.  In  this  section,  we  summarize  the  results  of  three  comparative 
tests  we  conducted  using  UNCMIN  with  the  test  problems  from  More’,  Garbow, 
and  Hillstrom  [1901]. 

The  first  test  compared  the  performance  of  our  algorithms  using  analytic 
gradients  and  Hessians  to  the  performance  of  the  same  algorithms  using  finite 
difference  gradients  and  Hessians,  on  a  small  number  of  test  problems.  Table 
7.1  shows  that  in  almost  all  cases,  there  is  very  little  or  no  difference  in  the 
number  of  iterations  or  function  evaluations  required  by  a  particular  method  on 
a  particular  problem  (if  the  extra  function  evaluations  used  in  the  finite 
difference  approximations  are  excluded).  This  is  true  whether  the  step  selection 
strategy  is  the  line  search,  dogleg,  or  hookstep.  Occasionally  there  is  a  substan¬ 
tial  difference;  ordinarily  this  is  due  to  the  sensitivity  of  the  test  problems  to 
small  changes  in  the  sequence  of  iterates.  In  two  cases  a  larger  difference 
occurred  because  the  finite  difference  method  switched  to  central  difference 
gradients.  (The  code  switches  to  central  difference  gradients  when  it  detects 
that  a  step  using  a  forward  difference  gradient  is  in  an  uphill  direction.  This 
decision  usually  only  occurs  after  10  to  20  backtracks  and  evaluations  of  /(x)  at 
one  iteration  have  failed  to  decrease  the  current  function  value,  thus  causing 
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large  differences  in  Table  7.1 .) 

In  our  experience,  the  results  of  Table  7.1  are  fairly  typical;  as  long  as  the 
gradient  stopping  tolerance  is  within  the  accuracy  obtainable  using  finite 
differences  gradients,  a  routine  will  usually  perform  about  the  same  using  ana¬ 
lytic  or  finite  difference  derivatives.  For  this  reason  and  because  it  is  more  indi¬ 
cative  of  how  minimization  routines  are  used  in  practice,  we  used  finite 
differences  rather  than  analytic  derivatives  in  the  subsequent  tests.  On  rare 
occasions,  we  noticed  that  our  results  were  impaired  because  the  automatic 
stepsizes  provided  by  rules  like  (3.3)  were  unsatisfactory. 

The  other  two  tests  compared  the  three  step  selection  strategies,  line 
search,  dogleg,  and  hookstep,  in  an  otherwise  identical  algorithm.  The  first  test 
compared  these  strategies  when  using  finite  difference  gradients  and  Hessians, 
while  the  second  compared  the  same  three  strategies  when  using  finite 
difference  gradients  and  secant  (BFGS)  approximations  to  the  Hessian.  In  both 
cases,  the  test  problem  set  was  most  of  the  problems  in  More’,  Garbow,  and 
Hillstrom  [1981]. 

We  present  only  a'brief  simple  summary  of  our  test  results  in  this  paper. 
The  reason  for  this  is  that  we  conclude  from  our  tests  that,  while  the  number  of 
iterations,  function,  and  derivative  evaluations  required  by  the  line  search, 
dogleg,  or  hookstep  methods  to  solve  a  particular  problem  using  the  same 
derivative  information  may  differ  significantly,  the  behavior  of  the  three  step 
selection  strategies  averaged  over  all  the  test  functions  is  very  close.  This  is 
true  whether  comparing  the  three  strategies  using  finite  difference  Hessians,  or 
using  secant  approximation  Hessians.  Furthermore,  each  method  is  best,  and 
each  is  worst,  on  some  problems.  A  more  comprehensive  reporting  of  our  test 
results  would  merely  reinforce  this  conclusion.  Such  a  report  can  be  found  in 
Weiss  [1980];  these  test  results  are  from  an  earlier  version  of  our  code  and  differ 
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occasionally,  but  not  significantly,  from  our  final  results. 

Table  7.2  summarizes  the  comparison  between  line  search,  dogleg,  and 
hookstep  when  using  finite  difference  gradients  and  Hessians.  It  compares  the 
number  of  functions  evaluations  required  by  each  method  to  solve  each  test 
problem,  including  the  function  evaluations  required  for  the  finite  difference 
derivatives.  (This  statistic  is  sometimes  called  "equivalent  function  evalua¬ 
tions' .)  If  we  compared  instead  the  number  of  iterations  required,  or  separated 
derivative  evaluations  from  function  evaluations,  the  results  would  appear  very 
similar.  (We  comment  on  run  times  separately  below.)  For  each  test  problem, 
we  assign  a  "1"  to  the  method  requiring  the  fewest  number  of  function  evalua¬ 
tions  (call  this  number  probmin);  to  the  other  two  methods  we  assign  the 
number  of  function  evaluations  they  required  divided  by  probmin.  If  a  method 
failed  to  solve  the  problem  in  500  iterations  or  gave  up,  an  F  (for  failure)  is 
recorded.  The  final  column  contains  probmin.  For  example,  if  the  line  search, 
dogleg,  and  hookstep  require  11,  10,  and  12  function  evaluations  respectively, 
the  row  of  numbers  would  be  1.1,  1,  1.2,  10.  The  last  column  allows  the  table  to 
show  absolute  as  well  as  relative  data.  The  bottom  three  lines  of  Table  7.2  con¬ 
tain  the  mean,  variance,  and  standard  deviation  of  the  first  three  columns.  Our 
method  of  reporting  results  obscures  the  fact  that  occasionally  one  method 
finds  the  solution  more  accurately  than  another,  due  to  the  variety  of  stopping 
conditions.  Such  differences  were  rare  and  would  not  affect  our  conclusions; 
moreover,  the  difference  in  final  values  of  /  (a:)  in  such  cases  were  never  more 
than  one  order  of  magnitude. 

The  main  conclusion  we  draw  from  Table  7.2  was  stated  above:  there  is  very 
little  overall  difference  between  the  three  step  selection  strategies.  Each 
method  solved  most  but  not  all  of  the  test  problems;  the  difference  in  the 
number  of  failures  probably  is  insignificant.  The  standard  deviations  of  the 
three  columns  suggest  that  the  differences  between  the  three  means  is 


statistically  insignificant.  Indeed,  different  implementations  of  the  three  stra¬ 
tegies  might  lead  to  a  different  ordering  of  the  means. 

Table  7.3  summarizes  in  identical  fashion  the  comparison  between  line 
search,  dogleg,  and  hookstep  using  finite  difference  gradients  and  secant  (BFGS) 
Hessian  approximations.  The  bottom  lines  again  indicate  no  significant 
difference  between  the  three  approaches.  It  may  be  significant  that  the  line 
search  method  had  no  failures  in  this  case  while  the  other  two  methods  had 
several.  There  were  several  cases  when  one  method  required  more  than  the 
minimum  number  of  function  evaluations  but  got  a  significantly  more  accurate 
answer  (more  than  two  orders  of  magnitude  difference  in  the  final  /  (x));  in 
these  cases,  this  method  also  is  given  a  score  of  1  in  Table  7.3,  with  the  actual 
rating  given  in  parentheses. 

The  set  of  test  problems  in  Tables  7.2  and  7.3  is  very  similar  to  the  set  used 
by  Gay  [1982].  We  attempted  to  run  each  minimization  problem  in  More',  Gar- 
bow.  and  Hillstrom  [1981]  from  the  standard  starting  point  zD.  from  10  x0,  and 
from  100  x0.  In  some  cases  the  latter  runs  are  omitted.  Most  often  this  is 
because  the  function  overflowed  at  the  starting  point  and  occasionally  because 
the  problem  was  too  expensive  to  run  or  because  no  method  could  solve  it.  On 
some  problems,  erroneous  finite  difference  derivative  calculations  cause  all  our 
methods  to  fail;  an  example  is  Brown's  badly  scaled  function,  where  the  scales 
of  the  starting  and  optimal  points  differ  by  six  orders  of  magnitude. 

Tables  7.2  and  7.3  do  not  indicate  the  CPU  times  required  by  the  various 
methods.  This  information  is  reported  in  Weiss  [1980]  and  was  accumulated  for 
our  Anal  tests  as  well.  For  the  lower  dimensional  test  problems,  these  times  give 
a  rough  indication  of  the  overhead  cost  per  iteration  of  each  method.  For  the 
larger  problems,  the  times  become  dominated  by  the  evaluations  of  the  test 
function,  since  due  to  our  use  of  finite  differences  there  are  many  function 
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evaluations. 

As  might  be  expected,  in  general  the  line  search  method  requires  less  time 
per  iteration  than  the  dogleg,  and  the  dogleg  less  than  the  hookstep.  The 
differences,  however,  are  not  very  large.  The  dogleg  occasionally  takes  as  much 
as  20-25%  more  time  per  iteration  than  the  line  search,  but  usually  the 
difference  is  10%  or  less.  The  discrepancy  probably  is  mainly  due  to  the  modu¬ 
lar  structure  that  causes  the  dogleg  to  require  more  subroutine  calls  per  itera¬ 
tion  than  the  line  search.  The  hookstep  typically  requires  20-30%  more  time  per 
Iteration  than  the  line  search.  An  additional  cost  in  the  hookstep  is  the  extra 
matrix  factorizations  that  are  sometimes  required.  On  most  practical  problems 
we  have  helped  to  solve,  the  objective  function  was  sufficiently  expensive  to 
evaluate  that  these  differences  in  algorithmic  overhead  were  incidental. 

The  stopping  tolerances  used  in  the  tests  reported  in  Tables  7.2  and  7.3  are 
not  overly  stringent.  All  the  successful  runs  in  Tables  7.2  and  7.3  stop  because 
they  satisfy  either 
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where  the  scaling  parameters  typx (  and  typf  have  the  default  value  1  in  all 
cases.  The  stopping  tolerance  values  were  gradtol  =  10"°  and  steptol  =  10-10, 
which  are  typical  of  the  tolerances  we  see  used  in  practice.  Gay  [1982]  in  his 
testa  used  much  tighter  tolerances.  We  also  ran  the  problems  in  Tables  7.2  and 
7.3  with  gradtol  =  10_l°.  Since  we  are  using  finite  difference  gradients,  this 
tolerance  often  Is  near  the  limit  of  the  attainable  accuracy,  and  significantly 
more  iterations  often  were  required.  The  only  noticeable  difference  in  the  com¬ 
parative  results,  however,  is  that  the  performance  of  the  hookstep  often 
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deteriorated  more  than  the  performance  of  the  other  two  methods.  This  may  be 
because,  close  to  the  solution,  the  hookstep  makes  more  use  of  the  (inaccurate) 
gradient  than  do  the  other  two  methods. 

Finally,  it  is  interesting  to  compare  the  finite  difference  Hessian  method 
with  the  secant  approximation  method  when  the  same  step  selection  strategy  is 
utilized.  In  76  of  102  cases  (34  problems  with  3  step  selection  strategies  each), 
the  finite  difference  Hessian  method  required  fewer  iterations  than  the  secant 
method.  This  confirms  the  conventional  wisdom  that  second  derivative  methods 
are  usually,  but  not  always,  less  expensive  than  secant  methods  if  function 
evaluation  is  sufficiently  inexpensive  that  algorithmic  overhead  is  the  overriding 
cost.  On  the  other  hand,  the  secant  method  required  fewer  total  function 
evaluations  (including  the  function  evaluations  used  by  finite  differences)  in  all 
but  16  of  the  102  cases.  The  16  include  all  9  runs  of  the  Brown-Dennis  problem 
and  only  7  other  cases.  This  again  confirms  the  conventional  wisdom  that  secant 
methods  usually  are  more  efficient  than  finite  difference  Hessian  methods  on 
problems  where  function  evaluation  is  expensive  and  analytic  Hessians  are  not 
available. 


Table  7.1 


Function 


Rosen- 

brock 


Powell 


Wood 


-  Comparison  between  performance  of  algorithms  using 
analytic  vs.  Unite  difference  derivatives 


Method  of  derivative 
evaluations 

Iterations  /  Function  evaluations  by 

Line  Search 

Dogleg 

Hookstep 

Analytic  gradient  and  Hessian 
Finite  diff.  gradient  and  Hessian 

24  /  33 

23  /  30 

21  /  25 

21  /  25 

22  /  27 

21  /  25 

BFGS,  analytic  gradient 

BFGS,  finite  diff.  gradient 

23  /  30 

23  /  30 

44  /  64 

43  /  60 

40  /  60 

41  /  63 

Analytic  gradient  and  Hessian 
Finite  diff.  gradient  and  Hessian 

15  /  16 

15  /  16 

15  /  17 

15  /  17 

15  /  16 

15  /  16 

BFGS,  analytic  gradient 

BFGS.  finite  diff.  gradient 

48  /  54 

44  /  70 

42  /  51 
44/53 

41/48 

41  /  48 

Analytic  gradient  and  Hessian 
Finite  diff.  gradient  and  Hessian 

58  /  92 

56  /  90 

36/40 

37  /  41 

40/46 

43  /  52 

BFGS.  analytic  gradient 

BFGS.  finite  diff.  gradient 

32  /  40 

32  /  40 

42  /  57 

42  /  51 

41  /  52 

48/68 

switched  to  central  difference  gradients 


Table  7.2  -  Comparative  test  results  using  finite  difference 

gradients  and  Hessians 


Function 


Beale 

Helical 

valley 

Gaussian 
Box  3D 
Wood 


Brown  - 
Dennis 

Biggs  Exp 
Watson 
Extended 
Rosen- 
brock 
Extend 
Powell 
Singular 
Penalty  I 


Penalty  II  10 


Variable  10 

Dimension 

Trigono-  10 

metric 

Chebyquad  9 


Starting 

point 


x0 

10x0 

x0 

10*0 

100x0 

x0 

*o 

*o 

10*o 

100*o 

x0 

10*o 

100x0 

*o 

x0 

*o 

10*o 

IOOxq 

10*o 

100*o 

10*o 

100*o 

*o 

'  10*o 

100*o 

*o 

10x0 

100*o 

10*o 
100*o 
*n 


Multiple  of  minimum 
function  evaluations 

Line  Search  Dogleg  Hookstep 

1  1.28  1.40 

r  1  F 

1  1.2 

1.45  1 

1.02  1 

1.06  1 

F  1 

1  1.17 

1  1.08 

1.10  1.06 

1.14  1.01 

108  1.08 

1  1 

F  F 

F  1 

1  1 

1  1.09 

1.26  1 

1  1 

1  1 

1  1 

1 63  1 

1  .1 

1  1.02 

1  1.0B 

1  1.05 

1  1.06 

1  1 

1  1 

1.04  1 

1.88  1 

1  1.20 

1.20  1 

1  OR  1 


Minimum 

function 

evaluations 


60 

222 

135 

187 

186 

17 

192 

711 

664 

888 

138 

252 

366 

17008 

2384 

1610 

3671 

11672 

804 

1069 

1387 

2294 

2901 

3280 

6708 

7166 

7623 

1151 

1379 

1918 

895 

1880 

1075 

2252 


Failures 
Successes 
Mean  of  successes 
Variance  of  successes 
Standard  deviation  or 
successes 


5 

29 

1.147 

0.0368 

0.195 


3 

31 

1.103 

0,0411 

0.206 


2 

32 

1.047 

0.00738 

0.0873 


Table  7.3  -  Comparative  test  results  using  finite  difference 

gradients  and  BFGS  Hessian  approximations 


Function 

n 

Starting 

Multiple  of  minimum 

Minimum 

point 

function  evaluations 

function 

evaluations 

Line  Search 

Dogleg 

Hookstep 

Beale 

2 

*0 

1.04 

1 

1.02 

51 

10x0 

1.10 

1 

1.10 

136 

Helical 

3 

So 

1  (1.11) 

1.04 

1 

117 

valley 

I0x„ 

1.01 

1 

1.05 

126 

IOOxq 

1 

1.09 

1.13 

123 

Gaussian 

3 

s0 

1.20 

1 

1.45 

20 

Box  3D 

3 

*0 

1.25 

1 

1.11 

118 

Wood 

4 

*0 

1 

1.34 

1.65 

172 

I0x0 

1 

1  (1.52) 

1.43 

302 

o 

H 

o 

o 

e-H 

1.16 

1.09 

1 

511 

Brown  - 

4 

*0 

1 

1.03 

1.03 

169 

Dennis 

I0x0 

1 

1.01 

1.01 

30B 

100xD 

1 

1 

1 

458 

Biggs  Exp 

6 

*0 

1 

1.14 

1.07 

309 

Watson 

9 

*0 

1.08 

1.01 

1 

1302 

Extended 

10 

*0 

1  (1.30) 

1.01 

1 

495 

Rosen- 

10x0 

1  (1.65) 

1.34 

1 

680 

brock 

lOOxo 

1  (1.33) 

1 

1.16 

1746 

Extend 

8 

x0 

1.22 

1  (1.08) 

1 

384 

Powell 

lOx0 

1  (1.37) 

1.01 

1  I 

828 

Singular 

IOOxq 

1.01 

1 

1.50 

1298 

Penalty  I 

10 

*0 

1 

F 

F 

1756 

10x0 

1.18 

1 

1.12 

1753 

IOOxq 

1 

F 

F 

2259 

Penalty  11 

10 

*0 

1.08 

1 

1.03 

271 

10x0 

1.78 

2.12 

1 

2531 

100x0 

1.07 

1 

1.05 

5293 

Variable 

10 

*0 

1.13 

1.06 

1 

171 

Dimension 

10x0 

1.06 

1  (1.58) 

1 

544 

100x0 

1 

[1] 

F 

170B 

Trigono- 

10 

*0 

1.03 

1 

1.03 

298 

metric 

10x0 

1 

1.18 

1.18 

985 

100x0 

1.68 

[1.07] 

1 

450 

Chebyquad 

9 

x0 

1.14 

1 

1 

234 

Failures 

0 

0  [+2] 

3 

Successes 

34 

32  [-2] 

31 

Mean  of  successes 

1.094 

1.082 

1.101 

Variance  of  successes 

0.0293 

0.0452 

0.0281 

Standard  deviation  of 

0.174 

0.216 

0.170 

successes 


31 


Number  of  times  found 
significantly  lower 
function  value  than 

5 

3 

0 

other  method  but 
required  more  iterations 

Means  if  parenthesized 

1.145 

1.122 

1.101 

values  used 

Variance 

0.0396 

0.0576 

0.0281 

Standard  deviation 

0.202 

0.244 

0.170 

[— ]  —  Close  to.  but  not  at,  solution 

Excluded  from  summary  statistics 
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